3 research outputs found
THORN: Temporal Human-Object Relation Network for Action Recognition
Most action recognition models treat human activities as unitary events.
However, human activities often follow a certain hierarchy. In fact, many human
activities are compositional. Also, these actions are mostly human-object
interactions. In this paper we propose to recognize human action by leveraging
the set of interactions that define an action. In this work, we present an
end-to-end network: THORN, that can leverage important human-object and
object-object interactions to predict actions. This model is built on top of a
3D backbone network. The key components of our model are: 1) An object
representation filter for modeling object. 2) An object relation reasoning
module to capture object relations. 3) A classification layer to predict the
action labels. To show the robustness of THORN, we evaluate it on
EPIC-Kitchen55 and EGTEA Gaze+, two of the largest and most challenging
first-person and human-object interaction datasets. THORN achieves
state-of-the-art performance on both datasets
MultiMediate'23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions
Automatic analysis of human behaviour is a fundamental prerequisite for the
creation of machines that can effectively interact with- and support humans in
social interactions. In MultiMediate'23, we address two key human social
behaviour analysis tasks for the first time in a controlled challenge:
engagement estimation and bodily behaviour recognition in social interactions.
This paper describes the MultiMediate'23 challenge and presents novel sets of
annotations for both tasks. For engagement estimation we collected novel
annotations on the NOvice eXpert Interaction (NOXI) database. For bodily
behaviour recognition, we annotated test recordings of the MPIIGroupInteraction
corpus with the BBSI annotation scheme. In addition, we present baseline
results for both challenge tasks.Comment: ACM MultiMedia'2
THORN: Temporal Human-Object Relation Network for Action Recognition
International audienceMost action recognition models treat human activities as unitary events. However, human activities often follow a certain hierarchy. In fact, many human activities are compositional. Also, these actions are mostly human-object interactions. In this paper we propose to recognize human action by leveraging the set of interactions that define an action. In this work, we present an end-to-end network: THORN, that can leverage important human-object and object-object interactions to predict actions. This model is built on top of a 3D backbone network. The key components of our model are: 1) An object representation filter for modeling object. 2) An object relation reasoning module to capture object relations. 3) A classification layer to predict the action labels. To show the robustness of THORN, we evaluate it on EPIC-Kitchen55 and EGTEA Gaze+, two of the largest and most challenging first-person and human-object interaction datasets. THORN achieves state-of-the-art performance on both datasets